Using a Virtual Event Space to Understand Parallel Application Communication Behavior
نویسندگان
چکیده
For scientific application run on clusters, communication performance becomes increasingly important when the number of cluster nodes increase. To understand the communication behavior, we have developed EventSpace, a configurable data collecting, management and observation system for monitoring low-level synchronization and communication events. Applications are instrumented by adding data collecting code in the form of event collectors to an applications communication paths. When triggered these create and store virtual events to a virtual event space. Based on the meta-data describing the communication paths, virtual events can be combined to provide different views of the applications communication behavior. We used the data collected by EventSpace to do a post-mortem analysis of a wind-tunnel application, a river simulator, global clock synchronization, and a collective operation. The views allowed us to detect anomalous communication behavior, detect load balance problems, find hotspots in a collective communication structure, synchronize the Pentium timestamp counters on the cluster nodes, and analyze the accuracy of the synchronization.
منابع مشابه
Thirdspace: The Trialectics of the Real, Virtual and Blended Spaces
This article aims to redefine the concept of Thirdspace and make a trilateral relationship between the three concepts of real space, virtual space and the user. To do so, not only the concept of Thirdspace has to be redefined, but also a new understanding of virtual space as a relatively independent space is required. This three-sided relation requires a new understanding of the relationship be...
متن کاملExperiences Parallelizing, Configuring, Monitoring and Visualizing Applications for Clusters and Multi-Clusters
To make it simpler to experiment with the impact different configurations can have on the performance of a parallel cluter application, we developed the PATHS system. The PATHS system use a “wrapper” to provide a level of indirection to the actual run-time location of data making the data available from wherever threads or processes are located. A wrapper specify where data is located, how to g...
متن کاملA Review of the Concepts of Social Action and Isolation in Virtual Space
Cyberspace and its impact as the main competitor of real space in various aspects is considered and have been studied by many thinkers and theorists. For various reasons (political, social, cultural, etc.) it is lead to the presence of people, especially young people in virtual space, as all borders crossed the behavior and influence actions of people. According to the increasing importance and...
متن کاملDistributed Rendering Techniques Using Virtual Walls
This framework presents a new generic solution to arbitrary geometrical problems using the Virtual Walls concept. It is based on the geometric distribution of the given object space by virtual walls into cells. The solution of the global problem is achieved by iteration of computation local to each cell and combination of neighbouring solutions. The Virtual Walls concept can be mapped eeciently...
متن کاملAsynchronous Checkpointing for PVM Requires Message-Logging
Distributed computing using networked workstations o ers cost-e cient parallel computing, but the higher rate of failure requires e ective fault-tolerance. Asynchronous consistent checkpointing o ers a low-overhead solution. Parallel Virtual Machine (PVM) allows a heterogeneous network of UNIX workstations to serve immmediately as a distributed computer by providing message-passing services imp...
متن کامل